Binary (Logistic) Regression with Three Predictors

To illustrate a regression model with three predictor variables, we will develop a logistic regression model for obesity as a function of gender, race and age for the US adult (age 18-79) population.

Gender is a binary variable with 2 levels (male and female), race is a categorical variable with 5 levels (White, Black, Mexican, Hispanic, Other), and age is a quantitative variable.

We will model the effect of age using natural cubic splines. In formulating our initial model, we need to specify the number of parameters (df) (complexity or degree of nonlinearity) for the age effect. In most situations, we will default to using a natural cubic spline with 4 df, whihc typically represents an appropriate balance between flexibility and loss of precision caused by overfitting. Small samples (\(N < 30\)) may require the use of 2 df, and moderate samples (\(N < 100\)) may require the use of 3 df. Empirical evidence suggests that more than 4 df are seldom required in a natural cubic spline regression model. Natural cubic spline models with 5 or 6 df should only be considered if the sample size is very large. Based on these considerations, since the effective sample size is fairly large, we will model the effect of age using a natural cubic spline with 5 df. (Using different df for the age effect and for the race effect is also helpful for the clarity of the presentation.)

The choice between a model with additive effects of all three predictor variables or a model with one or more interaction terms should depend primarily on the scientific context and, to a large extent, on the available sample size. The number of parameters \(p\) (regression coefficients) that can be reliably estimated is \(m/15\), where \(m\) is the effective sample size (for logistic regression, \(m\) is the minimum of the number of events and the number of non-events in our sample).

Possible Models

We denote the obesity outcome for subject \(i\) by \(y_i\), the gender for subject \(i\) by \(x_{1i}\), the race for subject \(i\) by \(x_{2i}\), and the age for subject \(i\) by \(x_{3i}\).

A logistic regression model for (prevalence of) obesity as a function of gender, race and age including all possible interaction terms is \[ \begin{eqnarray*} y_i & \sim & \mbox{Bernoulli}\left\{p_i\right\} \\ \mbox{logit}\left\{p_i\right\} & = & \alpha_0 + f_{g}(x_{1i},{\bf \beta}) + f_{r}(x_{2i},{\bf \gamma}) + f_{a}(x_{3i},{\bf \delta}) + \\ & & f_{g,r}(x_{1i},x_{2i},{\bf \beta\gamma}) + f_{g,a}(x_{1i},x_{3i},{\bf \beta\delta}) + f_{r,a}(x_{2i},x_{3i},{\bf \gamma\delta}) + \\ & & f_{g,r,a}(x_{1i},x_{2i},x_{3i},{\bf \beta\gamma\delta}) \\ \end{eqnarray*} \]

For modeling, gender must be coded with \(1~(=2-1)\) numeric variable, and race must be coded with \(4~(=5-1)\) numeric variables, and age will be represented by 5 variables, the basis for the natural cubic spline with 5 df or 6 knots (more generally, \(k\) variables, the basis for the natural cubic spline with \(k\) df or \(k+1\) knots).

The two-way interaction between gender and race is encoded using the \(4 = 1 \times 4\) products of the \(1\) variable used to encode gender and the \(4\) variables used to encode race. The two-way interaction between gender and age is encoded using the \(5 = 1 \times 5\) products of the \(1\) variable used to encode gender and the \(5\) variables used to encode age. The two-way interaction between race and age is encoded using the \(20 = 4 \times 5\) products of the \(4\) variables used to encode race and the \(5\) variables used to encode age.

The three-way interaction between gender, race and age is encoded using the \(20 = 1 \times 4 \times 5\) products of the \(1\) variable used to encode gender, the \(4\) variables used to encode race and the \(5\) variables used to encode age.

The model with all of these terms uses \[59 = 2 \times 5 \times 6 - 1 = 1 + 4 + 5 + 4 + 5 + 20 + 20\] df (regression parameters excluding the intercept).

There are seven simplifications of this model that could be of substantive interest: 1. an additive model for gender, race and age with \(10 = 1 + 4 + 5\) df \[ \begin{eqnarray*} \mbox{logit}\left\{p_i\right\} & = & \alpha_0 + f_{g}(x_{1i},{\bf \beta}) + f_{r}(x_{2i},{\bf \gamma}) + f_{a}(x_{3i},{\bf \delta}) \\ \end{eqnarray*} \] * the effect of gender does not vary by race and/or age * the effect of race does not vary by gender and/or age * the effect of age does not vary by gender and/or race 2. a two-way interaction between gender and race plus an additive effect of age with \(14 = 1 + 4 + 5 + 4\) df \[ \begin{eqnarray*} \mbox{logit}\left\{p_i\right\} & = & \alpha_0 + f_{g}(x_{1i},{\bf \beta}) + f_{r}(x_{2i},{\bf \gamma}) + f_{g,r}(x_{1i},x_{2i},{\bf \beta\gamma}) + f_{a}(x_{3i},{\bf \delta}) \\ \end{eqnarray*} \] * the effect of gender varies by race, but not by age * the effect of race varies by gender, but not by age * the effect of age does not vary by gender and/or race 3. a two-way interaction between gender and age plus an additive effect of race with \(15 = 1 + 4 + 5 + 5\) df \[ \begin{eqnarray*} \mbox{logit}\left\{p_i\right\} & = & \alpha_0 + f_{g}(x_{1i},{\bf \beta}) + f_{r}(x_{2i},{\bf \gamma}) + f_{a}(x_{3i},{\bf \delta}) + f_{g,a}(x_{1i},x_{3i},{\bf \beta\delta}) \\ \end{eqnarray*} \] * the effect of gender varies by age, but not by race * the effect of race does not vary by gender and/or age * the effect of age varies by gender, but not by race 4. a two-way interaction between age and race plus an additive effect of gender with \(30 = 1 + 4 + 5 + 20\) df \[ \begin{eqnarray*} \mbox{logit}\left\{p_i\right\} & = & \alpha_0 + f_{g}(x_{1i},{\bf \beta}) + f_{r}(x_{2i},{\bf \gamma}) + f_{a}(x_{3i},{\bf \delta}) + f_{r,a}(x_{2i},x_{3i},{\bf \gamma\delta}) \\ \end{eqnarray*} \] * the effect of gender does not vary by race and/or age * the effect of race varies by age, but not by gender * the effect of age varies by race, but not by gender 5. a two-way interaction between gender and race plus a two-way interaction between gender and age with \(19 = 1 + 4 + 5 + 4 + 5\) df \[ \begin{eqnarray*} \mbox{logit}\left\{p_i\right\} & = & \alpha_0 + f_{g}(x_{1i},{\bf \beta}) + f_{r}(x_{2i},{\bf \gamma}) + f_{a}(x_{3i},{\bf \delta}) + \\ & & f_{g,r}(x_{1i},x_{2i},{\bf \beta\gamma}) + f_{g,a}(x_{1i},x_{3i},{\bf \beta\delta}) \\ \end{eqnarray*} \] * the effect of gender varies by race and age * the effect of race varies by gender, but not by age * the effect of age varies by gender, but not by race 6. a two-way interaction between race and gender plus a two-way interaction between race and age with \(34 = 1 + 4 + 5 + 4 + 20\) df \[ \begin{eqnarray*} \mbox{logit}\left\{p_i\right\} & = & \alpha_0 + f_{g}(x_{1i},{\bf \beta}) + f_{r}(x_{2i},{\bf \gamma}) + f_{a}(x_{3i},{\bf \delta}) + \\ & & f_{g,r}(x_{1i},x_{2i},{\bf \beta\gamma}) + f_{r,a}(x_{2i},x_{3i},{\bf \gamma\delta}) \\ \end{eqnarray*} \] * the effect of gender varies by race, but not by age * the effect of race varies by gender and age * the effect of age varies by race, but not by gender 7. a two-way interaction between gender and age plus a two-way interaction between race and age with \(35 = 1 + 4 + 5 + 5 + 20\) df \[ \begin{eqnarray*} \mbox{logit}\left\{p_i\right\} & = & \alpha_0 + f_{g}(x_{1i},{\bf \beta}) + f_{r}(x_{2i},{\bf \gamma}) + f_{a}(x_{3i},{\bf \delta}) + \\ & & f_{g,a}(x_{1i},x_{3i},{\bf \beta\delta}) + f_{r,a}(x_{2i},x_{3i},{\bf \gamma\delta}) \\ \end{eqnarray*} \] * the effect of gender varies by age, but not by race * the effect of race varies by age, but not by gender * the effect of age varies by gender and race

The usefulness of these potential simplifications largely depends on the goals of our study. For example, if racial comparisons (the effects of race) are of primary interest, model 3 (the effect of race does not vary by gender and/or age), model 5 (the effect of race varies by gender, but not by age), and model 7 (the effect of race varies by age, but not by gender) represent meaningful simplifications of the full model with a three-way interaction, while the other models are not.

Initial Model: Three-Way Interaction Between Gender, Race and Age

Moving forward, we will assume our primary “exposure” variable of interest is race; gender and age are potential confounders and/or effect modifiers. Given our large effective sample size \((m=2592)\), we will allow for a potential three-way interaction between race, gender and age.

LIBNAME NHANES "../../../BMI552_Datasets/NHANES_May2023/";
*LIBNAME NHANES "~/my_shared_file_links/u48718377/NHANES";
OPTIONS FMTSEARCH=(NHANES.formats_NHANES work) NODATE NONUMBER;
TITLE1; TITLE2;
ODS NOPROCTITLE;
ODS GRAPHICS / LOESSMAXOBS=20000;

DATA Ron;
  SET NHANES.NHANES (RENAME=(Race1=Race));
RUN;

ODS SELECT NONE;
PROC LOGISTIC DATA=Ron;
  ODS SELECT ResponseProfile;
*  ODS SELECT ClassLevelInfo;
*  ODS SELECT SplineKnots;
    CLASS Gender Race / ORDER=INTERNAL;
  EFFECT AgeCS=SPLINE(Age / 
      NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));
    
  MODEL Obese(EVENT="Yes") = Gender Race AgeCS 
      Gender*Race Gender*AgeCS Race*AgeCS 
      Gender*Race*AgeCS / LINK=LOGIT;

  OUTPUT OUT=Tmp RESCHI=Residual PREDICTED=Fitted; 

  STORE GenderRaceAge;
RUN;
ODS SELECT ALL;
Response Profile
Ordered
Value
Obese Total
Frequency
1 No 4643
2 Yes 2592

Probability modeled is Obese='Yes'.

Assessing Adequacy of Modeling Assumptions

Before using the model, we must evaluate the adequacy of the modeling assumptions. For logistic regression models, the only assumptions are correct structural model and independence (which cannot be evaluated based on the observed data). The only concern is the potential for inadequacies in the representation of the effect of age, e.g. insufficient df. We can evaluate this assumption using plots of the residuals against age by gender and/or race.

PROC SGPANEL DATA=Tmp;
  PANELBY Gender / LAYOUT=ROWLATTICE;
    REFLINE 0 / AXIS=Y;
*  PBSPLINE X=Age Y=Residual / JITTER; *large samples;
  LOESS X=Age Y=Residual / JITTER; *small smaples;
RUN;

PROC SGPANEL DATA=Tmp;
  PANELBY Race / LAYOUT=ROWLATTICE;
    REFLINE 0 / AXIS=Y;
*  PBSPLINE X=Age Y=Residual / JITTER; *large samples;
  LOESS X=Age Y=Residual / JITTER; *small smaples;
RUN;

PROC SGPANEL DATA=Tmp;
  PANELBY Race Gender;
    REFLINE 0 / AXIS=Y;
*  PBSPLINE X=Age Y=Residual / JITTER; *large samples;
  LOESS X=Age Y=Residual / JITTER; *small smaples;
RUN;
The SGPanel Procedure Loess Pearson Residual 20 30 40 50 60 70 80 Age (yrs) -2 0 2 4 -2 0 2 4 Gender (Biological Sex) = Female Gender (Biological Sex) = Male

The SGPanel Procedure Loess 20 30 40 50 60 70 80 Age (yrs) -2 0 2 4 Pearson Residual Self-Reported Race (Mexican, Hispanic, White, Black, Other) = White
The SGPanel Procedure Loess 20 30 40 50 60 70 80 Age (yrs) -2 0 2 4 Pearson Residual Self-Reported Race (Mexican, Hispanic, White, Black, Other) = Black
The SGPanel Procedure Loess 20 30 40 50 60 70 80 Age (yrs) -2 0 2 4 Pearson Residual Self-Reported Race (Mexican, Hispanic, White, Black, Other) = Mexican
The SGPanel Procedure Loess 20 30 40 50 60 70 80 Age (yrs) -2 0 2 4 Pearson Residual Self-Reported Race (Mexican, Hispanic, White, Black, Other) = Hispanic
The SGPanel Procedure Loess 20 30 40 50 60 70 80 Age (yrs) -2 0 2 4 Pearson Residual Self-Reported Race (Mexican, Hispanic, White, Black, Other) = Other

The SGPanel Procedure Loess Age (yrs) Gender (Biological Sex) = Female Self-Reported Race (Mexican, Hispanic, White, Bl ... Gender (Biological Sex) = Male Self-Reported Race (Mexican, Hispanic, White, Bl ... 20 30 40 50 60 70 80 20 30 40 50 60 70 80 -2 0 2 4 Pearson Residual
The SGPanel Procedure Loess Age (yrs) Gender (Biological Sex) = Female Self-Reported Race (Mexican, Hispanic, White, Bl ... Gender (Biological Sex) = Male Self-Reported Race (Mexican, Hispanic, White, Bl ... 20 30 40 50 60 70 80 20 30 40 50 60 70 80 -2 0 2 4 Pearson Residual
The SGPanel Procedure Loess Age (yrs) Gender (Biological Sex) = Female Self-Reported Race (Mexican, Hispanic, White, Bl ... Gender (Biological Sex) = Male Self-Reported Race (Mexican, Hispanic, White, Bl ... 20 30 40 50 60 70 80 20 30 40 50 60 70 80 -2 0 2 4 Pearson Residual
The SGPanel Procedure Loess Age (yrs) Gender (Biological Sex) = Female Self-Reported Race (Mexican, Hispanic, White, Bl ... Gender (Biological Sex) = Male Self-Reported Race (Mexican, Hispanic, White, Bl ... 20 30 40 50 60 70 80 20 30 40 50 60 70 80 -2 0 2 4 Pearson Residual
The SGPanel Procedure Loess Age (yrs) Gender (Biological Sex) = Female Self-Reported Race (Mexican, Hispanic, White, Bl ... Gender (Biological Sex) = Male Self-Reported Race (Mexican, Hispanic, White, Bl ... 20 30 40 50 60 70 80 20 30 40 50 60 70 80 -2 0 2 4 Pearson Residual

The lack of any clear patterns in any of these plots suggests that the correct structural model assumption is valid.

Alternatively, we can use AIC or BIC to formally assess the adequacy of the model fit.

ODS EXCLUDE ALL;
PROC LOGISTIC DATA=Ron;
  ODS OUTPUT GlobalTests=DF5;
    CLASS Gender Race / ORDER=INTERNAL;
  EFFECT AgeCS=SPLINE(Age / 
      NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));
    
  MODEL Obese(EVENT="Yes") = Gender Race AgeCS 
      Gender*Race Gender*AgeCS Race*AgeCS 
      Gender*Race*AgeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
  ODS OUTPUT GlobalTests=DF6;
    CLASS Gender Race / ORDER=INTERNAL;
  EFFECT AgeCS=SPLINE(Age / 
      NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(2.5 18.3333 34.1667 50 65.8333 81.6667 97.5));
    
  MODEL Obese(EVENT="Yes") = Gender Race AgeCS 
      Gender*Race Gender*AgeCS Race*AgeCS 
      Gender*Race*AgeCS / LINK=LOGIT;
RUN;

DATA DF5;
  SET DF5;
  WHERE Test="Likelihood Ratio";
    
  Model = "Age 5df";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;    
RUN;

DATA DF6;
  SET DF6;
  WHERE Test="Likelihood Ratio";
    
  Model = "Age 6df";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;    
RUN;

DATA ModelComparisons;
  LENGTH Model $20;

  SET DF5 DF6;
    
  KEEP Model DF AIC BIC;    
RUN;

PROC MEANS DATA=ModelComparisons MAX NOPRINT;
  VAR AIC BIC;
  OUTPUT OUT=MaxIC MAX= / AUTONAME;
RUN;

DATA ModelComparisons;
  IF _N_=1 THEN SET MaxIC;
  SET ModelComparisons;
    
  AIC = AIC_Max - AIC;
  BIC = BIC_Max - BIC;
    
  KEEP Model DF AIC BIC;
RUN;
ODS EXCLUDE NONE;

TITLE "Model Comparisons (AIC/BIC)";
PROC PRINT DATA=ModelComparisons NOOBS;
  VAR Model DF AIC BIC;  
    
  FORMAT AIC BIC 6.1;
RUN;
TITLE;

Model Comparisons (AIC/BIC)

Model DF AIC BIC
Age 5df 59 0.0 0.0
Age 6df 69 4.7 63.3

Based on BIC (my preference for considering expansions of the model), there is conclusive evidence for the model using 5 df for the age effect.

Odds Ratios for Race by Gender and Age

ESTIMATE statements can be used to find comparisons of any two races for any combination of gender and age.

ODS EXCLUDE ALL;
PROC PLM RESTORE=GenderRaceAge;
    ODS OUTPUT Estimates=E;

    ESTIMATE "Mexican v Hispanic, Female 30" Race [1, 3] [-1, 4]
        Gender*Race [1, 2 3] [-1, 2 4]
        Race*AgeCS [1, 3 30] [-1, 4 30]
        Gender*Race*AgeCS [1, 2 3 30] [-1, 2 4 30] / EXP CL ALPHA=0.03125;
    ESTIMATE "Mexican v Hispanic, Male 30" Race [1, 3] [-1, 4]
        Gender*Race [1, 1 3] [-1, 1 4]
        Race*AgeCS [1, 3 30] [-1, 4 30]
        Gender*Race*AgeCS [1, 1 3 30] [-1, 1 4 30] / EXP CL ALPHA=0.03125;
    ESTIMATE "Mexican v Hispanic, Female 50" Race [1, 3] [-1, 4]
        Gender*Race [1, 2 3] [-1, 2 4]
        Race*AgeCS [1, 3 50] [-1, 4 50]
        Gender*Race*AgeCS [1, 2 3 50] [-1, 2 4 50] / EXP CL ALPHA=0.03125;
    ESTIMATE "Mexican v Hispanic, Male 50" Race [1, 3] [-1, 4]
        Gender*Race [1, 1 3] [-1, 1 4]
        Race*AgeCS [1, 3 50] [-1, 4 50]
        Gender*Race*AgeCS [1, 1 3 50] [-1, 1 4 50] / EXP CL ALPHA=0.03125;
    ESTIMATE "Mexican v Hispanic, Female 70" Race [1, 3] [-1, 4]
        Gender*Race [1, 2 3] [-1, 2 4]
        Race*AgeCS [1, 3 70] [-1, 4 70]
        Gender*Race*AgeCS [1, 2 3 70] [-1, 2 4 70] / EXP CL ALPHA=0.03125;
    ESTIMATE "Mexican v Hispanic, Male 70" Race [1, 3] [-1, 4]
        Gender*Race [1, 1 3] [-1, 1 4]
        Race*AgeCS [1, 3 70] [-1, 4 70]
        Gender*Race*AgeCS [1, 1 3 70] [-1, 1 4 70] / EXP CL ALPHA=0.03125;
RUN;
ODS EXCLUDE NONE;

PROC PRINT DATA=E NOOBS LABEL;
    VAR Label LowerExp ExpEstimate UpperExp;
    LABEL Label='00'x
        LowerExp="Lower CL"
        ExpEstimate="Estimate"
        UpperExp="Upper CL";
    TITLE "Odds Ratios for Race by Gender and Age";
RUN;

PROC SGPLOT DATA=E NOAUTOLEGEND;
  STYLEATTRS DATACOLORS=(Cx1b9e77 Cxd95f02 Cx7570b3 Cxe7298a Cx66a61e Cxe6ab02 Cxa6761d Cx666666);
  SCATTER Y=Label X=ExpEstimate / XERRORLOWER=LowerExp XERRORUPPER=UpperExp
         MARKERATTRS=(SYMBOL=Plus COLOR=Cx1b9e77) 
         ERRORBARATTRS=(THICKNESS=2 COLOR=Cx1b9e77) NOERRORCAPS;
  REFLINE 1 / AXIS=X;
  XAXIS GRID TYPE=LOG LABEL="Odds Ratio";  /* specify log scale */
  YAXIS GRID DISPLAY=(NOLABEL) DISCRETEORDER=DATA REVERSE;
RUN;
TITLE;

Odds Ratios for Race by Gender and Age

  Lower CL Estimate Upper CL
Mexican v Hispanic, Female 30 0.7741 1.8619 4.4785
Mexican v Hispanic, Male 30 0.8677 2.0123 4.6669
Mexican v Hispanic, Female 50 1.3438 3.9763 11.7661
Mexican v Hispanic, Male 50 0.2866 0.7941 2.2007
Mexican v Hispanic, Female 70 0.1422 0.4814 1.6302
Mexican v Hispanic, Male 70 0.2636 0.9056 3.1114

Odds Ratios for Race by Gender and Age; The SGPlot Procedure 0.1 1 10 Odds Ratio Mexican v Hispanic, Male 70 Mexican v Hispanic, Female 70 Mexican v Hispanic, Male 50 Mexican v Hispanic, Female 50 Mexican v Hispanic, Male 30 Mexican v Hispanic, Female 30 Odds Ratios for Race by Gender and Age

The odds ratios for obesity comparing Mexicans to Hispanics are \((0.77,4.48)\) for 30-year-old females, \((1.34,11.7)\) for 50-year-old females, and \((0.14,1.63)\) for 70-year-old females. The odds ratios for obesity comparing Mexicans to Hispanics are \((0.87,4.67)\) for 30-year-old males, \((0.29,2.20)\) for 50-year-old males, and \((0.26,3.11)\) for 70-year-old males.

ODS EXCLUDE ALL;
PROC PLM RESTORE=GenderRaceAge;
    ODS OUTPUT Estimates=E;

    ESTIMATE "Black v White, Female 30" Race [1, 2] [-1, 1]
        Gender*Race [1, 2 2] [-1, 2 1]
        Race*AgeCS [1, 2 30] [-1, 1 30]
        Gender*Race*AgeCS [1, 2 2 30] [-1, 2 1 30] / EXP CL ALPHA=0.03125;
    ESTIMATE "Black v White, Male 30" Race [1, 2] [-1, 1]
        Gender*Race [1, 1 2] [-1, 1 1]
        Race*AgeCS [1, 2 30] [-1, 1 30]
        Gender*Race*AgeCS [1, 1 2 30] [-1, 1 1 30] / EXP CL ALPHA=0.03125;
    ESTIMATE "Black v White, Female 50" Race [1, 2] [-1, 1]
        Gender*Race [1, 2 2] [-1, 2 1]
        Race*AgeCS [1, 2 50] [-1, 1 50]
        Gender*Race*AgeCS [1, 2 2 50] [-1, 2 1 50] / EXP CL ALPHA=0.03125;
    ESTIMATE "Black v White, Male 50" Race [1, 2] [-1, 1]
        Gender*Race [1, 1 2] [-1, 1 1]
        Race*AgeCS [1, 2 50] [-1, 1 50]
        Gender*Race*AgeCS [1, 1 2 50] [-1, 1 1 50] / EXP CL ALPHA=0.03125;
    ESTIMATE "Black v White, Female 70" Race [1, 2] [-1, 1]
        Gender*Race [1, 2 2] [-1, 2 1]
        Race*AgeCS [1, 2 70] [-1, 1 70]
        Gender*Race*AgeCS [1, 2 2 70] [-1, 2 1 70] / EXP CL ALPHA=0.03125;
    ESTIMATE "Black v White, Male 70" Race [1, 2] [-1, 1]
        Gender*Race [1, 1 2] [-1, 1 1]
        Race*AgeCS [1, 2 70] [-1, 1 70]
        Gender*Race*AgeCS [1, 1 2 70] [-1, 1 1 70] / EXP CL ALPHA=0.03125;
RUN;
ODS EXCLUDE NONE;

PROC PRINT DATA=E NOOBS LABEL;
    VAR Label LowerExp ExpEstimate UpperExp;
    LABEL Label='00'x
        LowerExp="Lower CL"
        ExpEstimate="Estimate"
        UpperExp="Upper CL";
    TITLE "Odds Ratios for Race by Gender and Age";
RUN;

PROC SGPLOT DATA=E NOAUTOLEGEND;
  STYLEATTRS DATACOLORS=(Cx1b9e77 Cxd95f02 Cx7570b3 Cxe7298a Cx66a61e Cxe6ab02 Cxa6761d Cx666666);
  SCATTER Y=Label X=ExpEstimate / XERRORLOWER=LowerExp XERRORUPPER=UpperExp
         MARKERATTRS=(SYMBOL=Plus COLOR=Cx1b9e77) 
         ERRORBARATTRS=(THICKNESS=2 COLOR=Cx1b9e77) NOERRORCAPS;
  REFLINE 1 / AXIS=X;
  XAXIS GRID TYPE=LOG LABEL="Odds Ratio";  /* specify log scale */
  YAXIS GRID DISPLAY=(NOLABEL) DISCRETEORDER=DATA REVERSE;
RUN;
TITLE;

Odds Ratios for Race by Gender and Age

  Lower CL Estimate Upper CL
Black v White, Female 30 2.7013 4.7846 8.4745
Black v White, Male 30 0.4777 0.8614 1.5533
Black v White, Female 50 1.5222 2.5676 4.3307
Black v White, Male 50 0.4343 0.7579 1.3225
Black v White, Female 70 1.0474 1.7483 2.9182
Black v White, Male 70 0.7321 1.3078 2.3364

Odds Ratios for Race by Gender and Age; The SGPlot Procedure 1 2 3 4 5 6 7 8 Odds Ratio Black v White, Male 70 Black v White, Female 70 Black v White, Male 50 Black v White, Female 50 Black v White, Male 30 Black v White, Female 30 Odds Ratios for Race by Gender and Age

The odds ratios for obesity comparing Blacks to Whites are \((2.70,8.47)\) for 30-year-old females, \((1.52,4.33)\) for 50-year-old females, and \((1.05,2.92)\) for 70-year-old females. The odds ratios for obesity comparing Blacks to Whites are \((0.48,1.55)\) for 30-year-old males, \((0.43,1.32)\) for 50-year-old males, and \((0.73,2.34)\) for 70-year-old males.

Prevalence by Race, Gender and Age

ESTIMATE statements can be used to obtain the prevalence of obesity for any combination of race, gender and age. Note that, in the context of the current model, the prevalence of obesity cannot be obtained for any racial group without specifying the age and gender. There is no single prevalence of obesity for Blacks.

More generally, one can only estimate proportions by specifying values for all of the covariates in the regression model.

ODS EXCLUDE ALL;
PROC PLM RESTORE=GenderRaceAge;
    ODS OUTPUT Estimates=E;

    ESTIMATE "Mexican, Female 30" Intercept 1 Gender [1, 2] Race [1, 3] AgeCS [1, 30]
        Gender*Race [1, 2 3] Gender*AgeCS [1, 2 30] Race*AgeCS [1, 3 30]
        Gender*Race*AgeCS [1, 2 3 30] / EXP ILINK ALPHA=0.03125;
    ESTIMATE "Hispanic, Female 30" Intercept 1 Gender [1, 2] Race [1, 4] AgeCS [1, 30]
        Gender*Race [1, 2 4] Gender*AgeCS [1, 2 30] Race*AgeCS [1, 4 30]
        Gender*Race*AgeCS [1, 2 4 30] / EXP ILINK ALPHA=0.03125;
    ESTIMATE "Mexican, Male 30" Intercept 1 Gender [1, 1] Race [1, 3] AgeCS [1, 30]
        Gender*Race [1, 1 3] Gender*AgeCS [1, 1 30] Race*AgeCS [1, 3 30]
        Gender*Race*AgeCS [1, 1 3 30] / EXP ILINK ALPHA=0.03125;
    ESTIMATE "Hispanic, Male 30" Intercept 1 Gender [1, 1] Race [1, 4] AgeCS [1, 30]
        Gender*Race [1, 1 4] Gender*AgeCS [1, 1 30] Race*AgeCS [1, 4 30]
        Gender*Race*AgeCS [1, 1 4 30] / EXP ILINK ALPHA=0.03125;
    ESTIMATE "Mexican, Female 50" Intercept 1 Gender [1, 2] Race [1, 3] AgeCS [1, 50]
        Gender*Race [1, 2 3] Gender*AgeCS [1, 2 50] Race*AgeCS [1, 3 50]
        Gender*Race*AgeCS [1, 2 3 50] / EXP ILINK ALPHA=0.03125;
    ESTIMATE "Hispanic, Female 50" Intercept 1 Gender [1, 2] Race [1, 4] AgeCS [1, 50]
        Gender*Race [1, 2 4] Gender*AgeCS [1, 2 50] Race*AgeCS [1, 4 50]
        Gender*Race*AgeCS [1, 2 4 50] / EXP ILINK ALPHA=0.03125;
    ESTIMATE "Mexican, Male 50" Intercept 1 Gender [1, 1] Race [1, 3] AgeCS [1, 50]
        Gender*Race [1, 1 3] Gender*AgeCS [1, 1 50] Race*AgeCS [1, 3 50]
        Gender*Race*AgeCS [1, 1 3 50] / EXP ILINK ALPHA=0.03125;
    ESTIMATE "Hispanic, Male 50" Intercept 1 Gender [1, 1] Race [1, 4] AgeCS [1, 50]
        Gender*Race [1, 1 4] Gender*AgeCS [1, 1 50] Race*AgeCS [1, 4 50]
        Gender*Race*AgeCS [1, 1 4 50] / EXP ILINK ALPHA=0.03125;
RUN;
ODS EXCLUDE NONE;

PROC PRINT DATA=E NOOBS LABEL;
    VAR Label LowerMu Mu UpperMu;
    LABEL Label='00'x
        Mu="Estimate"
        LowerMu="Lower CL"
        UpperMu="Upper CL";
    TITLE "Prevalence of Obesity";
RUN;
TITLE;

Prevalence of Obesity

  Lower CL Estimate Upper CL
Mexican, Female 30 0.3015 0.4267 0.5620
Hispanic, Female 30 0.1673 0.2855 0.4430
Mexican, Male 30 0.3127 0.4321 0.5600
Hispanic, Male 30 0.1627 0.2744 0.4239
Mexican, Female 50 0.4708 0.6382 0.7776
Hispanic, Female 50 0.1605 0.3073 0.5072
Mexican, Male 50 0.2204 0.3468 0.4993
Hispanic, Male 50 0.2308 0.4007 0.5983

The estimated regression functions (log odds of obesity as a function of race, gender and age) can be plotted using PROC SGPLOT.

ODS EXCLUDE ALL;  
DATA AllGendersAllRacesAllAges;
  DO Gender = 1 TO 2 BY 1;
    DO Race = 1 TO 5 BY 1;
      DO Age = 18 TO 79 BY 1;
        OUTPUT;
      END;
    END;
  END;
      
  FORMAT Gender GenderF.;
  FORMAT Race Race1F.;
RUN;

PROC PLM RESTORE=GenderRaceAge;
  SCORE DATA=AllGendersAllRacesAllAges OUT=Fitted Predicted LCLM=Lower_Logit UCLM=Upper_Logit / ALPHA=0.03125;
RUN;
      
DATA Fitted;
  SET Fitted;
      
  Lower_Prob = 1/(1+EXP(-Lower_Logit));
  Upper_Prob = 1/(1+EXP(-Upper_Logit));
RUN;      
ODS EXCLUDE NONE;

PROC SGPANEL DATA=Fitted;
  STYLEATTRS DATACOLORS=(Cx1b9e77 Cxd95f02 Cx7570b3 Cxe7298a Cx66a61e Cxe6ab02 Cxa6761d Cx666666);
  PANELBY Gender;
  BAND X=Age Lower=Lower_Logit Upper=Upper_Logit /
      GROUP=Race TRANSPARENCY=0.7;
  ROWAXIS LABEL="Logit (Log Odds) of Obesity";
RUN;
The SGPanel Procedure Other Hispanic Mexican Black White Race Age Gender = Female Gender = Male 20 40 60 80 20 40 60 80 -8 -6 -4 -2 0 2 Logit (Log Odds) of Obesity

Alternatively, the estimated prevalence of obesity can also be plotted using PROC SGPLOT.

PROC SGPANEL DATA=Fitted;
  STYLEATTRS DATACOLORS=(Cx1b9e77 Cxd95f02 Cx7570b3 Cxe7298a Cx66a61e Cxe6ab02 Cxa6761d Cx666666);
  PANELBY Gender;
  BAND X=Age Lower=Lower_Prob Upper=Upper_Prob /
      GROUP=Race TRANSPARENCY=0.7;
  ROWAXIS LABEL="Prevalence of Obesity";
RUN;
The SGPanel Procedure Other Hispanic Mexican Black White Race Age Gender = Female Gender = Male 20 40 60 80 20 40 60 80 0.0 0.2 0.4 0.6 0.8 Prevalence of Obesity

Hypothesis Tests

Likelihood ratio tests can be used to assess the evidence for (1) any effect of race, (2) varying effects of race by gender, (3) varying effects of race by age, and (4) varying effects of race by age and/or gender.

ODS EXCLUDE ALL;
PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=Full;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

    MODEL Obese(EVENT="Yes") = Gender Race AgeCS
        Gender*Race Gender*AgeCS Race*AgeCS
        Gender*Race*AgeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=GenderRace_GenderAge;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

    MODEL Obese(EVENT="Yes") = Gender Race AgeCS
        Gender*Race Gender*AgeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=RaceAge_GenderAge;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

    MODEL Obese(EVENT="Yes") = Gender Race AgeCS
        Gender*AgeCS Race*AgeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=Race_GenderAge;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

    MODEL Obese(EVENT="Yes") = Gender Race AgeCS
        Gender*AgeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=GenderAge;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

    MODEL Obese(EVENT="Yes") = Gender AgeCS
        Gender*AgeCS / LINK=LOGIT;
RUN;

DATA Full;
    SET Full;

    WHERE Test="Likelihood Ratio";
    RENAME ChiSq=ChiSq_Full DF=DF_Full;
RUN;

DATA GenderRace_GenderAge;
    LENGTH Model $50 Test $50;

    SET GenderRace_GenderAge;

    WHERE Test="Likelihood Ratio";

    Model = "Gender*Race Only";
    Test = "Race Effects Vary by Age";

    KEEP Model Test ChiSq DF;
RUN;

DATA RaceAge_GenderAge;
    LENGTH Model $50 Test $50;

    SET RaceAge_GenderAge;

    WHERE Test="Likelihood Ratio";

    Model = "Race*Age Only";
    Test = "Race Effects Vary by Gender";

    KEEP Model Test ChiSq DF;
RUN;

DATA Race_GenderAge;
    LENGTH Model $50 Test $50;

    SET Race_GenderAge;

    WHERE Test="Likelihood Ratio";

    Model = "Race Additive";
    Test = "Race Effects Vary by Gender and/or Age";

    KEEP Model Test ChiSq DF;
RUN;

DATA GenderAge;
    LENGTH Model $50 Test $50;

    SET GenderAge;

    WHERE Test="Likelihood Ratio";

    Model = "Race Omitted";
    Test = "Overall Effect of Race";

    KEEP Model Test ChiSq DF;
RUN;

DATA LRTests;
    LENGTH Model $50 Test $50;

    IF _N_ = 1 THEN SET Full;
    SET GenderRace_GenderAge RaceAge_GenderAge Race_GenderAge GenderAge;

    Chisq = Chisq_Full-Chisq;
    DF = DF_Full - DF;
    ProbChiSq = EXP(LOGSDF("CHISQUARE",Chisq,DF));
    sValue = -LOG(ProbChiSq)/LOG(2);

    KEEP Test DF ChiSq ProbChiSq sValue;
RUN;
ODS EXCLUDE NONE;

TITLE "Likelihood Ratio Tests";
PROC PRINT DATA=LRTests NOOBS;
    VAR Test DF ChiSq ProbChiSq sValue;

    FORMAT sValue 6.2;
    FORMAT ChiSq 6.2;
RUN;
TITLE;

Likelihood Ratio Tests

Test DF ChiSq ProbChiSq sValue
Race Effects Vary by Age 40 84.51 <.0001 14.28
Race Effects Vary by Gender 24 91.18 <.0001 30.02
Race Effects Vary by Gender and/or Age 44 131.08 <.0001 32.76
Overall Effect of Race 48 249.26 <.0001 93.85

There is overwhelming evidence that the prevalence of obesity varies by race \((p \approx 0, s=94)\). There is very strong evidence of effect modification by age \((p \approx 0, s=14)\) and gender \((p \approx 0, s=30)\).

Initial Model: Additive Model

For illustration, we will revisit the modeling process starting with the additive model.

ODS SELECT NONE;
PROC LOGISTIC DATA=Ron;
*  ODS SELECT ResponseProfile;
*  ODS SELECT ClassLevelInfo;
*  ODS SELECT SplineKnots;
    CLASS Gender Race / ORDER=INTERNAL;
  EFFECT AgeCS=SPLINE(Age / 
      NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));
    
  MODEL Obese(EVENT="Yes") = Gender Race AgeCS / LINK=LOGIT;

  OUTPUT OUT=Tmp RESCHI=Residual PREDICTED=Fitted; 

  STORE GenderRaceAge;
RUN;
ODS SELECT ALL;

Assessing Adequacy of Modeling Assumptions

Before using the model, we must evaluate the adequacy of the modeling assumptions. For logistic regression models, the only assumptions are correct structural model and independence (which cannot be evaluated based on the observed data). The only concern is the potential for inadequacies in the representation of the effect of age, e.g. insufficient df, or the omission of required interaction terms. We can evaluate this assumption using plots of the residuals.

PROC SGPANEL DATA=Tmp;
  PANELBY Gender / LAYOUT=ROWLATTICE;
    REFLINE 0 / AXIS=Y;
*  PBSPLINE X=Age Y=Residual / JITTER; *large samples;
  LOESS X=Age Y=Residual / JITTER; *small smaples;
RUN;

PROC SGPANEL DATA=Tmp;
  PANELBY Race / LAYOUT=ROWLATTICE;
    REFLINE 0 / AXIS=Y;
*  PBSPLINE X=Age Y=Residual / JITTER; *large samples;
  LOESS X=Age Y=Residual / JITTER; *small smaples;
RUN;

PROC SGPANEL DATA=Tmp;
  PANELBY Race Gender;
    REFLINE 0 / AXIS=Y;
*  PBSPLINE X=Age Y=Residual / JITTER; *large samples;
  LOESS X=Age Y=Residual / JITTER; *small smaples;
RUN;
The SGPanel Procedure Loess Pearson Residual 20 30 40 50 60 70 80 Age (yrs) -1 0 1 2 -1 0 1 2 Gender (Biological Sex) = Female Gender (Biological Sex) = Male

The SGPanel Procedure Loess 20 30 40 50 60 70 80 Age (yrs) -1 0 1 2 Pearson Residual Self-Reported Race (Mexican, Hispanic, White, Black, Other) = White
The SGPanel Procedure Loess 20 30 40 50 60 70 80 Age (yrs) -1 0 1 2 Pearson Residual Self-Reported Race (Mexican, Hispanic, White, Black, Other) = Black
The SGPanel Procedure Loess 20 30 40 50 60 70 80 Age (yrs) -1 0 1 2 Pearson Residual Self-Reported Race (Mexican, Hispanic, White, Black, Other) = Mexican
The SGPanel Procedure Loess 20 30 40 50 60 70 80 Age (yrs) -1 0 1 2 Pearson Residual Self-Reported Race (Mexican, Hispanic, White, Black, Other) = Hispanic
The SGPanel Procedure Loess 20 30 40 50 60 70 80 Age (yrs) -1 0 1 2 Pearson Residual Self-Reported Race (Mexican, Hispanic, White, Black, Other) = Other

The SGPanel Procedure Loess Age (yrs) Gender (Biological Sex) = Female Self-Reported Race (Mexican, Hispanic, White, Bl ... Gender (Biological Sex) = Male Self-Reported Race (Mexican, Hispanic, White, Bl ... 20 30 40 50 60 70 80 20 30 40 50 60 70 80 -1 0 1 2 Pearson Residual
The SGPanel Procedure Loess Age (yrs) Gender (Biological Sex) = Female Self-Reported Race (Mexican, Hispanic, White, Bl ... Gender (Biological Sex) = Male Self-Reported Race (Mexican, Hispanic, White, Bl ... 20 30 40 50 60 70 80 20 30 40 50 60 70 80 -1 0 1 2 Pearson Residual
The SGPanel Procedure Loess Age (yrs) Gender (Biological Sex) = Female Self-Reported Race (Mexican, Hispanic, White, Bl ... Gender (Biological Sex) = Male Self-Reported Race (Mexican, Hispanic, White, Bl ... 20 30 40 50 60 70 80 20 30 40 50 60 70 80 -1 0 1 2 Pearson Residual
The SGPanel Procedure Loess Age (yrs) Gender (Biological Sex) = Female Self-Reported Race (Mexican, Hispanic, White, Bl ... Gender (Biological Sex) = Male Self-Reported Race (Mexican, Hispanic, White, Bl ... 20 30 40 50 60 70 80 20 30 40 50 60 70 80 -1 0 1 2 Pearson Residual
The SGPanel Procedure Loess Age (yrs) Gender (Biological Sex) = Female Self-Reported Race (Mexican, Hispanic, White, Bl ... Gender (Biological Sex) = Male Self-Reported Race (Mexican, Hispanic, White, Bl ... 20 30 40 50 60 70 80 20 30 40 50 60 70 80 -1 0 1 2 Pearson Residual

There are some indications of a lack of model fit, particularly the clear trends in the residual plot for Hispanics.

Alternatively, we can use AIC or BIC to formally assess the adequacy of the model fit.

ODS EXCLUDE ALL;
PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=DF5;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

    MODEL Obese(EVENT="Yes") = Gender Race AgeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=DF6;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(2.5 18.3333 34.1667 50 65.8333 81.6667 97.5));

    MODEL Obese(EVENT="Yes") = Gender Race AgeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=ThreeWay;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

    MODEL Obese(EVENT="Yes") = Gender Race AgeCS
        Gender*Race Gender*AgeCS Race*AgeCS
        Gender*Race*AgeCS / LINK=LOGIT;
RUN;
        
PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=RaceAge_GenderAge;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

    MODEL Obese(EVENT="Yes") = Gender Race AgeCS
        Gender*Race Race*AgeCS / LINK=LOGIT;
RUN;        

PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=GenderRace_GenderAge;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

    MODEL Obese(EVENT="Yes") = Gender Race AgeCS
        Gender*Race Gender*AgeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=GenderRace_RaceAge;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

    MODEL Obese(EVENT="Yes") = Gender Race AgeCS
        Gender*Race Race*AgeCS / LINK=LOGIT;
RUN;
        
PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=Race_GenderAge;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

    MODEL Obese(EVENT="Yes") = Gender Race AgeCS
        Gender*AgeCS / LINK=LOGIT;
RUN;
        
PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=Gender_RaceAge;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

    MODEL Obese(EVENT="Yes") = Gender Race AgeCS
        Race*AgeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=Age_GenderRace;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

    MODEL Obese(EVENT="Yes") = Gender Race AgeCS
        Gender*Race / LINK=LOGIT;
RUN;

DATA DF5;
  LENGTH Model $40;
  SET DF5;
  WHERE Test="Likelihood Ratio";
    
  Model = "Age 5df";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;    
RUN;

DATA DF6;
  LENGTH Model $40;
  SET DF6;
  WHERE Test="Likelihood Ratio";
    
  Model = "Age 6df";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;    
RUN;
      
DATA ThreeWay;
  LENGTH Model $40;
  SET ThreeWay;
  WHERE Test="Likelihood Ratio";
    
  Model = "Gender x Race x Age";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;    
RUN;
        
DATA RaceAge_GenderAge;
  LENGTH Model $40;
  SET RaceAge_GenderAge;
  WHERE Test="Likelihood Ratio";
    
  Model = "Gender x Age + Race x Age";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;    
RUN;
        
DATA GenderRace_GenderAge;
  LENGTH Model $40;
  SET GenderRace_GenderAge;
  WHERE Test="Likelihood Ratio";
    
  Model = "Gender x Race + Gender x Age";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;    
RUN;
        
DATA GenderRace_RaceAge;
  LENGTH Model $40;
  SET GenderRace_RaceAge;
  WHERE Test="Likelihood Ratio";
    
  Model = "Gender x Age + Race x Age";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;    
RUN;

DATA Race_GenderAge;
  LENGTH Model $40;
  SET Race_GenderAge;
  WHERE Test="Likelihood Ratio";
    
  Model = "Gender x Age";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;    
RUN;
        
DATA Gender_RaceAge;
  LENGTH Model $40;
  SET Gender_RaceAge;
  WHERE Test="Likelihood Ratio";
    
  Model = "Race x Age";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;    
RUN;

DATA Age_GenderRace;
  LENGTH Model $40;
  SET Age_GenderRace;
  WHERE Test="Likelihood Ratio";
    
  Model = "Gender x Race";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;    
RUN;
        
DATA ModelComparisons;
  LENGTH Model $40;

  SET DF5 DF6 
      ThreeWay RaceAge_GenderAge GenderRace_GenderAge GenderRace_RaceAge 
      Race_GenderAge Gender_RaceAge Age_GenderRace;
    
  KEEP Model DF AIC BIC;    
RUN;

PROC MEANS DATA=ModelComparisons MAX NOPRINT;
  VAR AIC BIC;
  OUTPUT OUT=MaxIC MAX= / AUTONAME;
RUN;

DATA ModelComparisons;
  IF _N_=1 THEN SET MaxIC;
  SET ModelComparisons;
    
  AIC = AIC_Max - AIC;
  BIC = BIC_Max - BIC;
    
  KEEP Model DF AIC BIC;
RUN;
ODS EXCLUDE NONE;

TITLE "Model Comparisons (AIC/BIC)";
PROC PRINT DATA=ModelComparisons NOOBS;
  VAR Model DF AIC BIC;  
    
  FORMAT AIC BIC 6.1;
RUN;
TITLE;

Model Comparisons (AIC/BIC)

Model DF AIC BIC
Age 5df 10 42.4 13.2
Age 6df 11 43.5 20.2
Gender x Race x Age 59 0.0 257.9
Gender x Age + Race x Age 34 6.5 118.0
Gender x Race + Gender x Age 19 4.5 28.0
Gender x Age + Race x Age 34 6.5 118.0
Gender x Age 15 43.1 43.1
Race x Age 30 42.4 130.4
Gender x Race 14 5.8 0.0

Based on BIC, the optimal model is a two-way interaction between gender and race with an additive effect of age. This model is decisively better than our initial model (or any of the other models), so we will move forward using this revised model.

Revised Model: Gender-Race Interaction + Age

ODS SELECT NONE;
PROC LOGISTIC DATA=Ron;
*  ODS SELECT ResponseProfile;
*  ODS SELECT ClassLevelInfo;
*  ODS SELECT SplineKnots;
    CLASS Gender Race / ORDER=INTERNAL;
  EFFECT AgeCS=SPLINE(Age /
      NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

  MODEL Obese(EVENT="Yes") = Gender Race Gender*Race AgeCS / LINK=LOGIT;

  OUTPUT OUT=Tmp RESCHI=Residual PREDICTED=Fitted;

  STORE GenderRaceAge;
RUN;
ODS SELECT ALL;

Assessing Adequacy of Modeling Assumptions

Before using the model, we must evaluate the adequacy of the modeling assumptions. For logistic regression models, the only assumptions are correct structural model and independence (which cannot be evaluated based on the observed data). The only concern is the potential for inadequacies in the representation of the effect of age, e.g. insufficient df, or the omission of required interaction terms. We can evaluate this assumption using plots of the residuals.

PROC SGPANEL DATA=Tmp;
  PANELBY Gender / LAYOUT=ROWLATTICE;
    REFLINE 0 / AXIS=Y;
*  PBSPLINE X=Age Y=Residual / JITTER; *large samples;
  LOESS X=Age Y=Residual / JITTER; *small smaples;
RUN;

PROC SGPANEL DATA=Tmp;
  PANELBY Race / LAYOUT=ROWLATTICE;
    REFLINE 0 / AXIS=Y;
*  PBSPLINE X=Age Y=Residual / JITTER; *large samples;
  LOESS X=Age Y=Residual / JITTER; *small smaples;
RUN;

PROC SGPANEL DATA=Tmp;
  PANELBY Race Gender;
    REFLINE 0 / AXIS=Y;
* PBSPLINE X=Age Y=Residual / JITTER; *large samples;
  LOESS X=Age Y=Residual / JITTER; *small smaples;
RUN;
The SGPanel Procedure Loess Pearson Residual 20 30 40 50 60 70 80 Age (yrs) -1 0 1 2 -1 0 1 2 Gender (Biological Sex) = Female Gender (Biological Sex) = Male

The SGPanel Procedure Loess 20 30 40 50 60 70 80 Age (yrs) -1 0 1 2 Pearson Residual Self-Reported Race (Mexican, Hispanic, White, Black, Other) = White
The SGPanel Procedure Loess 20 30 40 50 60 70 80 Age (yrs) -1 0 1 2 Pearson Residual Self-Reported Race (Mexican, Hispanic, White, Black, Other) = Black
The SGPanel Procedure Loess 20 30 40 50 60 70 80 Age (yrs) -1 0 1 2 Pearson Residual Self-Reported Race (Mexican, Hispanic, White, Black, Other) = Mexican
The SGPanel Procedure Loess 20 30 40 50 60 70 80 Age (yrs) -1 0 1 2 Pearson Residual Self-Reported Race (Mexican, Hispanic, White, Black, Other) = Hispanic
The SGPanel Procedure Loess 20 30 40 50 60 70 80 Age (yrs) -1 0 1 2 Pearson Residual Self-Reported Race (Mexican, Hispanic, White, Black, Other) = Other

The SGPanel Procedure Loess Age (yrs) Gender (Biological Sex) = Female Self-Reported Race (Mexican, Hispanic, White, Bl ... Gender (Biological Sex) = Male Self-Reported Race (Mexican, Hispanic, White, Bl ... 20 30 40 50 60 70 80 20 30 40 50 60 70 80 -1 0 1 2 Pearson Residual
The SGPanel Procedure Loess Age (yrs) Gender (Biological Sex) = Female Self-Reported Race (Mexican, Hispanic, White, Bl ... Gender (Biological Sex) = Male Self-Reported Race (Mexican, Hispanic, White, Bl ... 20 30 40 50 60 70 80 20 30 40 50 60 70 80 -1 0 1 2 Pearson Residual
The SGPanel Procedure Loess Age (yrs) Gender (Biological Sex) = Female Self-Reported Race (Mexican, Hispanic, White, Bl ... Gender (Biological Sex) = Male Self-Reported Race (Mexican, Hispanic, White, Bl ... 20 30 40 50 60 70 80 20 30 40 50 60 70 80 -1 0 1 2 Pearson Residual
The SGPanel Procedure Loess Age (yrs) Gender (Biological Sex) = Female Self-Reported Race (Mexican, Hispanic, White, Bl ... Gender (Biological Sex) = Male Self-Reported Race (Mexican, Hispanic, White, Bl ... 20 30 40 50 60 70 80 20 30 40 50 60 70 80 -1 0 1 2 Pearson Residual
The SGPanel Procedure Loess Age (yrs) Gender (Biological Sex) = Female Self-Reported Race (Mexican, Hispanic, White, Bl ... Gender (Biological Sex) = Male Self-Reported Race (Mexican, Hispanic, White, Bl ... 20 30 40 50 60 70 80 20 30 40 50 60 70 80 -1 0 1 2 Pearson Residual

There are some indications of a lack of model fit, particularly the clear trends in the residual plot for Hispanics.

Alternatively, we can use AIC or BIC to formally assess the adequacy of the model fit.

ODS EXCLUDE ALL;
PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=DF5;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

    MODEL Obese(EVENT="Yes") = Gender Race Gender*Race AgeCS  / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=DF6;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(2.5 18.3333 34.1667 50 65.8333 81.6667 97.5));

    MODEL Obese(EVENT="Yes") = Gender Race Gender*Race AgeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=ThreeWay;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

    MODEL Obese(EVENT="Yes") = Gender Race AgeCS
        Gender*Race Gender*AgeCS Race*AgeCS
        Gender*Race*AgeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=GenderRace_GenderAge;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

    MODEL Obese(EVENT="Yes") = Gender Race Gender*Race AgeCS
         Gender*AgeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=GenderRace_RaceAge;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

    MODEL Obese(EVENT="Yes") = Gender Race Gender*Race AgeCS
        Race*AgeCS / LINK=LOGIT;
RUN;

DATA DF5;
  LENGTH Model $40;
  SET DF5;
  WHERE Test="Likelihood Ratio";

  Model = "Gender x Race + Age 5df";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;
RUN;

DATA DF6;
  LENGTH Model $40;
  SET DF6;
  WHERE Test="Likelihood Ratio";

  Model = "Gender x Race + Age 6df";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;
RUN;

DATA ThreeWay;
  LENGTH Model $40;
  SET ThreeWay;
  WHERE Test="Likelihood Ratio";

  Model = "Gender x Race x Age";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;
RUN;

DATA GenderRace_GenderAge;
  LENGTH Model $40;
  SET GenderRace_GenderAge;
  WHERE Test="Likelihood Ratio";

  Model = "Gender x Race + Gender x Age";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;
RUN;

DATA GenderRace_RaceAge;
  LENGTH Model $40;
  SET GenderRace_RaceAge;
  WHERE Test="Likelihood Ratio";

  Model = "Gender x Race + Race x Age";
  AIC = ChiSq - 2*(DF+1);
  BIC = ChiSq - LOG(2592)*(DF+1);

  KEEP Model DF AIC BIC;
RUN;

DATA ModelComparisons;
  LENGTH Model $40;

  SET DF5 DF6
      ThreeWay GenderRace_GenderAge GenderRace_RaceAge;

  KEEP Model DF AIC BIC;
RUN;

PROC MEANS DATA=ModelComparisons MAX NOPRINT;
  VAR AIC BIC;
  OUTPUT OUT=MaxIC MAX= / AUTONAME;
RUN;

DATA ModelComparisons;
  IF _N_=1 THEN SET MaxIC;
  SET ModelComparisons;

  AIC = AIC_Max - AIC;
  BIC = BIC_Max - BIC;

  KEEP Model DF AIC BIC;
RUN;
ODS EXCLUDE NONE;

TITLE "Model Comparisons (AIC/BIC)";
PROC PRINT DATA=ModelComparisons NOOBS;
  VAR Model DF AIC BIC;

  FORMAT AIC BIC 6.1;
RUN;
TITLE;

Model Comparisons (AIC/BIC)

Model DF AIC BIC
Gender x Race + Age 5df 14 5.8 0.0
Gender x Race + Age 6df 15 6.9 6.9
Gender x Race x Age 59 0.0 257.9
Gender x Race + Gender x Age 19 4.5 28.0
Gender x Race + Race x Age 34 6.5 118.0

Based on BIC, our current model, a two-way interaction between gender and race with an additive effect of age, is the optimal model and decisively better than the other models under consideration, so we will proceed with inference.

Odds Ratios for Race by Gender

ESTIMATE statements can be used to find comparisons of any two races for either gender and any fixed age.

ODS EXCLUDE ALL;
PROC PLM RESTORE=GenderRaceAge;
  ODS OUTPUT Estimates=E;

  ESTIMATE "Black v. White, Female" Race [1, 2] [-1, 1] 
      Gender*Race [1, 2 2] [-1, 2 1] / CL EXP ALPHA=0.03125;
  ESTIMATE "Mexican v. White, Female" Race [1, 3] [-1, 1]  
      Gender*Race [1, 2 3] [-1, 2 1] / CL EXP ALPHA=0.03125;
  ESTIMATE "Hispanic v. White, Female" Race [1, 4] [-1, 1]  
      Gender*Race [1, 2 4] [-1, 2 1] / CL EXP ALPHA=0.03125;
  ESTIMATE "Other v. White, Female" Race [1, 5] [-1, 1]  
      Gender*Race [1, 2 5] [-1, 2 1] / CL EXP ALPHA=0.03125;

  ESTIMATE "Black v. White, Male" Race [1, 2] [-1, 1] 
      Gender*Race [1, 1 2] [-1, 1 1] / CL EXP ALPHA=0.03125;
  ESTIMATE "Mexican v. White, Male" Race [1, 3] [-1, 1]  
      Gender*Race [1, 1 3] [-1, 1 1] / CL EXP ALPHA=0.03125;
  ESTIMATE "Hispanic v. White, Male" Race [1, 4] [-1, 1]  
      Gender*Race [1, 1 3] [-1, 1 1] / CL EXP ALPHA=0.03125;
  ESTIMATE "Other v. White, Male" Race [1, 5] [-1, 1]  
      Gender*Race [1, 1 5] [-1, 1 1] / CL EXP ALPHA=0.03125;
RUN;
ODS EXCLUDE NONE;

PROC PRINT DATA=E NOOBS LABEL;
    VAR Label LowerExp ExpEstimate UpperExp;
    LABEL Label='00'x
        LowerExp="Lower CL"
        ExpEstimate="Estimate"
        UpperExp="Upper CL";
    TITLE "Odds Ratios for Race by Gender and Age";
RUN;

PROC SGPLOT DATA=E NOAUTOLEGEND;
  STYLEATTRS DATACOLORS=(Cx1b9e77 Cxd95f02 Cx7570b3 Cxe7298a Cx66a61e Cxe6ab02 Cxa6761d Cx666666);
  SCATTER Y=Label X=ExpEstimate / XERRORLOWER=LowerExp XERRORUPPER=UpperExp
         MARKERATTRS=(SYMBOL=Plus COLOR=Cx1b9e77) 
         ERRORBARATTRS=(THICKNESS=2 COLOR=Cx1b9e77) NOERRORCAPS;
  REFLINE 1 / AXIS=X;
  XAXIS GRID TYPE=LOG LABEL="Odds Ratio";  /* specify log scale */
  YAXIS GRID DISPLAY=(NOLABEL) DISCRETEORDER=DATA REVERSE;
RUN;
TITLE;

Odds Ratios for Race by Gender and Age

  Lower CL Estimate Upper CL
Black v. White, Female 2.2714 2.8560 3.5911
Mexican v. White, Female 1.3065 1.7399 2.3172
Hispanic v. White, Female 0.7279 1.0093 1.3996
Other v. White, Female 0.3762 0.5238 0.7294
Black v. White, Male 0.8507 1.0892 1.3945
Mexican v. White, Male 0.9083 1.1778 1.5272
Hispanic v. White, Male 0.6639 0.8972 1.2127
Other v. White, Male 0.4232 0.5858 0.8109

Odds Ratios for Race by Gender and Age; The SGPlot Procedure 0.5 1 1.5 2 2.5 3 3.5 Odds Ratio Other v. White, Male Hispanic v. White, Male Mexican v. White, Male Black v. White, Male Other v. White, Female Hispanic v. White, Female Mexican v. White, Female Black v. White, Female Odds Ratios for Race by Gender and Age

For females, the odds ratios for obesity (adjusted for age) are \((2.27,3.59)\) comparing Blacks to Whites, \((1.31,2.32)\) comparing Mexicans to Whites, \((0.73,1.40)\) comparing Hispanics to Whites and \((0.38,0.73)\) comparing Other Races to Whites; for males, the odds ratios for obesity (adjusted for age) are \((0.85,1.39)\) comparing Blacks to Whites, \((0.91,1.53)\) comparing Mexicans to Whites, \((0.66,1.21)\) comparing Hispanics to Whites and \((0.42,0.81)\) comparing Other Races to Whites

Prevalence by Race, Gender and Age

ESTIMATE statements can be used to obtain the prevalence of obesity for any combination of race, gender and age. Note that, in the context of the current model, the prevalence of obesity cannot be obtained for any racial group without specifying the age and gender. There is no single prevalence of obesity for Blacks.

ODS EXCLUDE ALL;
PROC PLM RESTORE=GenderRaceAge;
    ODS OUTPUT Estimates=E;

    ESTIMATE "Mexican, Female 30" Intercept 1 Gender [1, 2] Race [1, 3] AgeCS [1, 30]
        Gender*Race [1, 2 3] / EXP ILINK ALPHA=0.03125;
    ESTIMATE "Hispanic, Female 30" Intercept 1 Gender [1, 2] Race [1, 4] AgeCS [1, 30]
        Gender*Race [1, 2 4]  / EXP ILINK ALPHA=0.03125;
    ESTIMATE "Mexican, Male 30" Intercept 1 Gender [1, 1] Race [1, 3] AgeCS [1, 30]
        Gender*Race [1, 1 3]  / EXP ILINK ALPHA=0.03125;
    ESTIMATE "Hispanic, Male 30" Intercept 1 Gender [1, 1] Race [1, 4] AgeCS [1, 30]
        Gender*Race [1, 1 4]  / EXP ILINK ALPHA=0.03125;
    ESTIMATE "Mexican, Female 50" Intercept 1 Gender [1, 2] Race [1, 3] AgeCS [1, 50]
        Gender*Race [1, 2 3]  / EXP ILINK ALPHA=0.03125;
    ESTIMATE "Hispanic, Female 50" Intercept 1 Gender [1, 2] Race [1, 4] AgeCS [1, 50]
        Gender*Race [1, 2 4]  / EXP ILINK ALPHA=0.03125;
    ESTIMATE "Mexican, Male 50" Intercept 1 Gender [1, 1] Race [1, 3] AgeCS [1, 50]
        Gender*Race [1, 1 3]  / EXP ILINK ALPHA=0.03125;
    ESTIMATE "Hispanic, Male 50" Intercept 1 Gender [1, 1] Race [1, 4] AgeCS [1, 50]
        Gender*Race [1, 1 4]  / EXP ILINK ALPHA=0.03125;
RUN;
ODS EXCLUDE NONE;
PROC PRINT DATA=E NOOBS LABEL;
    VAR Label LowerMu Mu UpperMu;
    LABEL Label='00'x
        Mu="Estimate"
        LowerMu="Lower CL"
        UpperMu="Upper CL";
    TITLE "Prevalence of Obesity";
RUN;
TITLE;

Prevalence of Obesity

  Lower CL Estimate Upper CL
Mexican, Female 30 0.3863 0.4572 0.5298
Hispanic, Female 30 0.2597 0.3282 0.4049
Mexican, Male 30 0.3365 0.3985 0.4639
Hispanic, Male 30 0.3221 0.3986 0.4803
Mexican, Female 50 0.3842 0.4553 0.5282
Hispanic, Female 50 0.2578 0.3265 0.4036
Mexican, Male 50 0.3346 0.3966 0.4621
Hispanic, Male 50 0.3199 0.3967 0.4790

The estimated regression functions (log odds of obesity as a function of race, gender and age) can be plotted using PROC SGPLOT.

ODS EXCLUDE ALL;  
DATA AllGendersAllRacesAllAges;
  DO Gender = 1 TO 2 BY 1;
    DO Race = 1 TO 5 BY 1;
      DO Age = 18 TO 79 BY 1;
        OUTPUT;
      END;
    END;
  END;
      
  FORMAT Gender GenderF.;
  FORMAT Race Race1F.;
RUN;

PROC PLM RESTORE=GenderRaceAge;
  SCORE DATA=AllGendersAllRacesAllAges OUT=Fitted Predicted LCLM=Lower_Logit UCLM=Upper_Logit / ALPHA=0.03125;
RUN;
      
DATA Fitted;
  SET Fitted;
      
  Lower_Prob = 1/(1+EXP(-Lower_Logit));
  Upper_Prob = 1/(1+EXP(-Upper_Logit));
RUN;      
ODS EXCLUDE NONE;

PROC SGPANEL DATA=Fitted;
  STYLEATTRS DATACOLORS=(Cx1b9e77 Cxd95f02 Cx7570b3 Cxe7298a Cx66a61e Cxe6ab02 Cxa6761d Cx666666);
  PANELBY Gender;
  BAND X=Age Lower=Lower_Logit Upper=Upper_Logit /
      GROUP=Race TRANSPARENCY=0.7;
  ROWAXIS LABEL="Logit (Log Odds) of Obesity";
RUN;
The SGPanel Procedure Other Hispanic Mexican Black White Race Age Gender = Female Gender = Male 20 40 60 80 20 40 60 80 -2 -1 0 1 Logit (Log Odds) of Obesity

Alternatively, the estimated prevalence of obesity can also be plotted using PROC SGPLOT.

PROC SGPANEL DATA=Fitted;
  STYLEATTRS DATACOLORS=(Cx1b9e77 Cxd95f02 Cx7570b3 Cxe7298a Cx66a61e Cxe6ab02 Cxa6761d Cx666666);
  PANELBY Gender;
  BAND X=Age Lower=Lower_Prob Upper=Upper_Prob /
      GROUP=Race TRANSPARENCY=0.7;
  ROWAXIS LABEL="Prevalence of Obesity";
RUN;
The SGPanel Procedure Other Hispanic Mexican Black White Race Age Gender = Female Gender = Male 20 40 60 80 20 40 60 80 0.2 0.4 0.6 Prevalence of Obesity

Hypothesis Tests

Likelihood ratio tests can be used to assess the evidence for (1) any effect of race and (2) varying effects of race by gender.

ODS EXCLUDE ALL;
PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=Full;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

    MODEL Obese(EVENT="Yes") = Gender Race AgeCS
        Gender*Race / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=Additive;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

    MODEL Obese(EVENT="Yes") = Gender Race AgeCS / LINK=LOGIT;
RUN;

PROC LOGISTIC DATA=Ron;
    ODS OUTPUT GlobalTests=GenderAge;

    CLASS Gender Race / ORDER=INTERNAL;
    EFFECT AgeCS=SPLINE(Age /
        NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILELIST(5 23 41 59 77 95));

    MODEL Obese(EVENT="Yes") = Gender AgeCS / LINK=LOGIT;
RUN;

DATA Full;
    SET Full;

    WHERE Test="Likelihood Ratio";
    RENAME ChiSq=ChiSq_Full DF=DF_Full;
RUN;

DATA Additive;
    LENGTH Model $50 Test $50;

    SET Additive;

    WHERE Test="Likelihood Ratio";

    Model = "Additive";
    Test = "Race Effects Vary by Gender";

    KEEP Model Test ChiSq DF;
RUN;

DATA GenderAge;
    LENGTH Model $50 Test $50;

    SET GenderAge;

    WHERE Test="Likelihood Ratio";

    Model = "Race Omitted";
    Test = "Overall Effect of Race";

    KEEP Model Test ChiSq DF;
RUN;

DATA LRTests;
    LENGTH Model $50 Test $50;

    IF _N_ = 1 THEN SET Full;
    SET Additive GenderAge;

    Chisq = Chisq_Full-Chisq;
    DF = DF_Full - DF;
    ProbChiSq = EXP(LOGSDF("CHISQUARE",Chisq,DF));
    sValue = -LOG(ProbChiSq)/LOG(2);

    KEEP Test DF ChiSq ProbChiSq sValue;
RUN;
ODS EXCLUDE NONE;

TITLE "Likelihood Ratio Tests";
PROC PRINT DATA=LRTests NOOBS;
    VAR Test DF ChiSq ProbChiSq sValue;

    FORMAT sValue 6.2;
    FORMAT ChiSq 6.2;
RUN;
TITLE;

Likelihood Ratio Tests

Test DF ChiSq ProbChiSq sValue
Race Effects Vary by Gender 4 44.64 <.0001 27.66
Overall Effect of Race 8 162.82 <.0001 100.94

There is overwhelming evidence that the prevalence of obesity varies by race \((p \approx 0, s=101)\). There is very strong evidence that the race differences in the log odds of obesity vary by gender \((p \approx 0, s=28)\).